{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wSFbIMb87cHu"
      },
      "source": [
        "# **Bioinformatics Project - Computational Drug Discovery [Part 1] Download Bioactivity Data (Concised version)**\n",
        "\n",
        "Chanin Nantasenamat\n",
        "\n",
        "[*'Data Professor' YouTube channel*](http://youtube.com/dataprofessor)\n",
        "\n",
        "In this Jupyter notebook, we will be building a real-life **data science project** that you can include in your **data science portfolio**. Particularly, we will be building a machine learning model using the ChEMBL bioactivity data.\n",
        "\n",
        "In **Part 1**, we will be performing Data Collection and Pre-Processing from the ChEMBL Database.\n",
        "\n",
        "Note for this Concised Version:\n",
        "* Redundant code cells were deleted.\n",
        "* Code cells for saving files to Google Drive has been deleted.\n",
        "\n",
        "---"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3iQiERxumDor"
      },
      "source": [
        "## **ChEMBL Database**\n",
        "\n",
        "The [*ChEMBL Database*](https://www.ebi.ac.uk/chembl/) is a database that contains curated bioactivity data of more than 2 million compounds. It is compiled from more than 76,000 documents, 1.2 million assays and the data spans 13,000 targets and 1,800 cells and 33,000 indications.\n",
        "[Data as of March 25, 2020; ChEMBL version 26]."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iryGAwAIQ4yf"
      },
      "source": [
        "## **Installing libraries**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "toGT1U_B7F2i"
      },
      "source": [
        "Install the ChEMBL web service package so that we can retrieve bioactivity data from the ChEMBL Database."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "cJGExHQBfLh7",
        "outputId": "0af2e9e8-b783-42f9-83be-da79d80d7896"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting chembl_webresource_client\n",
            "  Downloading chembl_webresource_client-0.10.7-py3-none-any.whl (55 kB)\n",
            "\u001b[?25l\r\u001b[K     |██████                          | 10 kB 24.0 MB/s eta 0:00:01\r\u001b[K     |███████████▉                    | 20 kB 13.0 MB/s eta 0:00:01\r\u001b[K     |█████████████████▊              | 30 kB 10.2 MB/s eta 0:00:01\r\u001b[K     |███████████████████████▋        | 40 kB 8.9 MB/s eta 0:00:01\r\u001b[K     |█████████████████████████████▌  | 51 kB 5.1 MB/s eta 0:00:01\r\u001b[K     |████████████████████████████████| 55 kB 2.4 MB/s \n",
            "\u001b[?25hRequirement already satisfied: requests>=2.18.4 in /usr/local/lib/python3.7/dist-packages (from chembl_webresource_client) (2.23.0)\n",
            "Requirement already satisfied: easydict in /usr/local/lib/python3.7/dist-packages (from chembl_webresource_client) (1.9)\n",
            "Collecting requests-cache~=0.7.0\n",
            "  Downloading requests_cache-0.7.5-py3-none-any.whl (39 kB)\n",
            "Requirement already satisfied: urllib3 in /usr/local/lib/python3.7/dist-packages (from chembl_webresource_client) (1.24.3)\n",
            "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->chembl_webresource_client) (2.10)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->chembl_webresource_client) (2021.10.8)\n",
            "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.18.4->chembl_webresource_client) (3.0.4)\n",
            "Requirement already satisfied: attrs<22.0,>=21.2 in /usr/local/lib/python3.7/dist-packages (from requests-cache~=0.7.0->chembl_webresource_client) (21.2.0)\n",
            "Collecting itsdangerous>=2.0.1\n",
            "  Downloading itsdangerous-2.0.1-py3-none-any.whl (18 kB)\n",
            "Collecting pyyaml>=5.4\n",
            "  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)\n",
            "\u001b[K     |████████████████████████████████| 596 kB 11.1 MB/s \n",
            "\u001b[?25hCollecting url-normalize<2.0,>=1.4\n",
            "  Downloading url_normalize-1.4.3-py2.py3-none-any.whl (6.8 kB)\n",
            "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from url-normalize<2.0,>=1.4->requests-cache~=0.7.0->chembl_webresource_client) (1.15.0)\n",
            "Installing collected packages: url-normalize, pyyaml, itsdangerous, requests-cache, chembl-webresource-client\n",
            "  Attempting uninstall: pyyaml\n",
            "    Found existing installation: PyYAML 3.13\n",
            "    Uninstalling PyYAML-3.13:\n",
            "      Successfully uninstalled PyYAML-3.13\n",
            "  Attempting uninstall: itsdangerous\n",
            "    Found existing installation: itsdangerous 1.1.0\n",
            "    Uninstalling itsdangerous-1.1.0:\n",
            "      Successfully uninstalled itsdangerous-1.1.0\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "flask 1.1.4 requires itsdangerous<2.0,>=0.24, but you have itsdangerous 2.0.1 which is incompatible.\u001b[0m\n",
            "Successfully installed chembl-webresource-client-0.10.7 itsdangerous-2.0.1 pyyaml-6.0 requests-cache-0.7.5 url-normalize-1.4.3\n"
          ]
        }
      ],
      "source": [
        "! pip install chembl_webresource_client"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "J0kJjL8gb5nX"
      },
      "source": [
        "## **Importing libraries**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "RXoCvMPPfNrv"
      },
      "outputs": [],
      "source": [
        "# Import necessary libraries\n",
        "import pandas as pd\n",
        "from chembl_webresource_client.new_client import new_client"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "1FgUai1bfigC"
      },
      "source": [
        "## **Search for Target protein**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7lBsDrD0gAqH"
      },
      "source": [
        "### **Target search for coronavirus**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 852
        },
        "id": "Vxtp79so4ZjF",
        "outputId": "0bbddccc-70eb-42e0-a2b3-c5c15510cea0"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-72c92a92-a4ed-4596-9e60-6b4ec25c6beb\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>cross_references</th>\n",
              "      <th>organism</th>\n",
              "      <th>pref_name</th>\n",
              "      <th>score</th>\n",
              "      <th>species_group_flag</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_components</th>\n",
              "      <th>target_type</th>\n",
              "      <th>tax_id</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>[]</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>33.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>[]</td>\n",
              "      <td>ORGANISM</td>\n",
              "      <td>2697049.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>[]</td>\n",
              "      <td>Severe acute respiratory syndrome-related coro...</td>\n",
              "      <td>SARS-CoV</td>\n",
              "      <td>32.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4303836</td>\n",
              "      <td>[]</td>\n",
              "      <td>ORGANISM</td>\n",
              "      <td>694009.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>[]</td>\n",
              "      <td>SARS coronavirus</td>\n",
              "      <td>SARS coronavirus</td>\n",
              "      <td>15.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL612575</td>\n",
              "      <td>[]</td>\n",
              "      <td>ORGANISM</td>\n",
              "      <td>227859.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>[]</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Serine--tRNA ligase, cytoplasmic</td>\n",
              "      <td>14.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4523232</td>\n",
              "      <td>[{'accession': 'P49591', 'component_descriptio...</td>\n",
              "      <td>SINGLE PROTEIN</td>\n",
              "      <td>9606.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>[{'xref_id': 'P0C6U8', 'xref_name': None, 'xre...</td>\n",
              "      <td>SARS coronavirus</td>\n",
              "      <td>SARS coronavirus 3C-like proteinase</td>\n",
              "      <td>11.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL3927</td>\n",
              "      <td>[{'accession': 'P0C6U8', 'component_descriptio...</td>\n",
              "      <td>SINGLE PROTEIN</td>\n",
              "      <td>227859.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2484</th>\n",
              "      <td>[]</td>\n",
              "      <td>Rattus norvegicus</td>\n",
              "      <td>Voltage-gated sodium channel</td>\n",
              "      <td>0.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL3988641</td>\n",
              "      <td>[{'accession': 'O88457', 'component_descriptio...</td>\n",
              "      <td>PROTEIN FAMILY</td>\n",
              "      <td>10116.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2485</th>\n",
              "      <td>[]</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>von Hippel-Lindau disease tumor suppressor/Elo...</td>\n",
              "      <td>0.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4296117</td>\n",
              "      <td>[{'accession': 'Q16665', 'component_descriptio...</td>\n",
              "      <td>PROTEIN COMPLEX</td>\n",
              "      <td>9606.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2486</th>\n",
              "      <td>[]</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>UDP-glucuronosyltransferases (UGTs)</td>\n",
              "      <td>0.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4523985</td>\n",
              "      <td>[{'accession': 'P22310', 'component_descriptio...</td>\n",
              "      <td>PROTEIN FAMILY</td>\n",
              "      <td>9606.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2487</th>\n",
              "      <td>[]</td>\n",
              "      <td>Mus musculus</td>\n",
              "      <td>I-kappa-B kinase</td>\n",
              "      <td>0.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4524000</td>\n",
              "      <td>[{'accession': 'Q60680', 'component_descriptio...</td>\n",
              "      <td>PROTEIN COMPLEX</td>\n",
              "      <td>10090.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2488</th>\n",
              "      <td>[]</td>\n",
              "      <td>Rattus norvegicus</td>\n",
              "      <td>I-kappa-B kinase</td>\n",
              "      <td>0.0</td>\n",
              "      <td>False</td>\n",
              "      <td>CHEMBL4524001</td>\n",
              "      <td>[{'accession': 'Q9QY78', 'component_descriptio...</td>\n",
              "      <td>PROTEIN COMPLEX</td>\n",
              "      <td>10116.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>2489 rows × 9 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-72c92a92-a4ed-4596-9e60-6b4ec25c6beb')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-72c92a92-a4ed-4596-9e60-6b4ec25c6beb button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-72c92a92-a4ed-4596-9e60-6b4ec25c6beb');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "                                       cross_references  ...     tax_id\n",
              "0                                                    []  ...  2697049.0\n",
              "1                                                    []  ...   694009.0\n",
              "2                                                    []  ...   227859.0\n",
              "3                                                    []  ...     9606.0\n",
              "4     [{'xref_id': 'P0C6U8', 'xref_name': None, 'xre...  ...   227859.0\n",
              "...                                                 ...  ...        ...\n",
              "2484                                                 []  ...    10116.0\n",
              "2485                                                 []  ...     9606.0\n",
              "2486                                                 []  ...     9606.0\n",
              "2487                                                 []  ...    10090.0\n",
              "2488                                                 []  ...    10116.0\n",
              "\n",
              "[2489 rows x 9 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 3
        }
      ],
      "source": [
        "# Target search for coronavirus\n",
        "target = new_client.target\n",
        "target_query = target.search('sars-cov-2')\n",
        "targets = pd.DataFrame.from_dict(target_query)\n",
        "targets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Y5OPfEALjAfZ"
      },
      "source": [
        "### **Select and retrieve bioactivity data for *SARS coronavirus 3C-like proteinase* (fifth entry)**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "gSQ3aroOgML7"
      },
      "source": [
        "We will assign the fifth entry (which corresponds to the target protein, *coronavirus 3C-like proteinase*) to the ***selected_target*** variable "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 35
        },
        "id": "StrcHMVLha7u",
        "outputId": "26b89de5-6f1a-4abb-fb94-1e92a068be15"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            },
            "text/plain": [
              "'CHEMBL4303835'"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ],
      "source": [
        "selected_target = targets.target_chembl_id[0]\n",
        "selected_target"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GWd2DRalgjzB"
      },
      "source": [
        "Here, we will retrieve only bioactivity data for *coronavirus 3C-like proteinase* (CHEMBL3927) that are reported as IC$_{50}$ values in nM (nanomolar) unit."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "id": "LeFbV_CsSP8D"
      },
      "outputs": [],
      "source": [
        "activity = new_client.activity\n",
        "res = activity.filter(target_chembl_id=selected_target).filter(standard_type=\"IC50\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "RC4T-NEmSWV-"
      },
      "outputs": [],
      "source": [
        "df = pd.DataFrame.from_dict(res)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "lxCsx5bFnd67",
        "outputId": "3abc4552-6d1f-4a7f-d071-db53499c6aa6"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-c5db75ac-f2f7-4ae0-a26a-25a0aeff1277\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_comment</th>\n",
              "      <th>activity_id</th>\n",
              "      <th>activity_properties</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>assay_variant_accession</th>\n",
              "      <th>assay_variant_mutation</th>\n",
              "      <th>bao_endpoint</th>\n",
              "      <th>bao_format</th>\n",
              "      <th>bao_label</th>\n",
              "      <th>canonical_smiles</th>\n",
              "      <th>data_validity_comment</th>\n",
              "      <th>data_validity_description</th>\n",
              "      <th>document_chembl_id</th>\n",
              "      <th>document_journal</th>\n",
              "      <th>document_year</th>\n",
              "      <th>ligand_efficiency</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>molecule_pref_name</th>\n",
              "      <th>parent_molecule_chembl_id</th>\n",
              "      <th>pchembl_value</th>\n",
              "      <th>potential_duplicate</th>\n",
              "      <th>qudt_units</th>\n",
              "      <th>record_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>src_id</th>\n",
              "      <th>standard_flag</th>\n",
              "      <th>standard_relation</th>\n",
              "      <th>standard_text_value</th>\n",
              "      <th>standard_type</th>\n",
              "      <th>standard_units</th>\n",
              "      <th>standard_upper_value</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>target_pref_name</th>\n",
              "      <th>target_tax_id</th>\n",
              "      <th>text_value</th>\n",
              "      <th>toid</th>\n",
              "      <th>type</th>\n",
              "      <th>units</th>\n",
              "      <th>uo_units</th>\n",
              "      <th>upper_value</th>\n",
              "      <th>value</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>None</td>\n",
              "      <td>18827175</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCN1CCN(Cc2ccc(Nc3ncc(F)c(-c4cc(F)c5nc(C)n(C(C...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>ABEMACICLIB</td>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>5.18</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133933</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>6620.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>6.62</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>None</td>\n",
              "      <td>18827176</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCN(CC)Cc1cc(Nc2ccnc3cc(Cl)ccc23)ccc1O</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>AMODIAQUINE</td>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>5.29</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133934</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>5150.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>5.15</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>None</td>\n",
              "      <td>18827177</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCCCCOc1ccc(-c2ccc(-c3ccc(C(=O)N[C@H]4C[C@@H](...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>ANIDULAFUNGIN</td>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>5.33</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133935</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>4640.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>4.64</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>None</td>\n",
              "      <td>18827178</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>BAZEDOXIFENE</td>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>5.46</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133936</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>3440.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>3.44</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>None</td>\n",
              "      <td>18827179</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>COc1cc2c3cc1Oc1c(OC)c(OC)cc4c1[C@@H](Cc1ccc(O)...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>BERBAMINE</td>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>5.10</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133937</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>7870.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>7.87</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>365</th>\n",
              "      <td>None</td>\n",
              "      <td>20154347</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCCCn1c(NC(=O)c2cccs2)c(C#N)c2nc3ccccc3nc21</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>4.71</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3362297</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>19650.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>19.65</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>366</th>\n",
              "      <td>None</td>\n",
              "      <td>20154348</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>Cc1c(C(=O)c2cccc3ccccc23)c2cccc3c2n1[C@H](CN1C...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL188</td>\n",
              "      <td>WIN-552122</td>\n",
              "      <td>CHEMBL188</td>\n",
              "      <td>None</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3361526</td>\n",
              "      <td>&gt;</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>&gt;</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>33000.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>33.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>367</th>\n",
              "      <td>None</td>\n",
              "      <td>20154349</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CNC(=O)Nc1ccc(-c2nc(N3CC4CCC(C3)O4)c3cnn(C4CCC...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>4.67</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3363375</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>21620.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>21.62</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>368</th>\n",
              "      <td>None</td>\n",
              "      <td>20154350</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CC(C)N1CCN(c2ccc(C(=O)c3c(-c4ccc(O)cc4)sc4cc(O...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>4.53</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3368236</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>29870.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>29.87</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>369</th>\n",
              "      <td>None</td>\n",
              "      <td>20154351</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>Nc1ccc(C(=O)Nc2cccc(-c3nc(N4CCOCC4)c4oc5ncccc5...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>5.33</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3366145</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>4730.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>4.73</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>370 rows × 45 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-c5db75ac-f2f7-4ae0-a26a-25a0aeff1277')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-c5db75ac-f2f7-4ae0-a26a-25a0aeff1277 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-c5db75ac-f2f7-4ae0-a26a-25a0aeff1277');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    activity_comment  activity_id  ... upper_value  value\n",
              "0               None     18827175  ...        None   6.62\n",
              "1               None     18827176  ...        None   5.15\n",
              "2               None     18827177  ...        None   4.64\n",
              "3               None     18827178  ...        None   3.44\n",
              "4               None     18827179  ...        None   7.87\n",
              "..               ...          ...  ...         ...    ...\n",
              "365             None     20154347  ...        None  19.65\n",
              "366             None     20154348  ...        None   33.0\n",
              "367             None     20154349  ...        None  21.62\n",
              "368             None     20154350  ...        None  29.87\n",
              "369             None     20154351  ...        None   4.73\n",
              "\n",
              "[370 rows x 45 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 14
        }
      ],
      "source": [
        "df"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 319
        },
        "id": "s9iUAXFdSkoM",
        "outputId": "72fa5145-358b-4927-ae62-139afa2d1e0c"
      },
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_comment</th>\n",
              "      <th>activity_id</th>\n",
              "      <th>activity_properties</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>assay_variant_accession</th>\n",
              "      <th>assay_variant_mutation</th>\n",
              "      <th>bao_endpoint</th>\n",
              "      <th>bao_format</th>\n",
              "      <th>bao_label</th>\n",
              "      <th>canonical_smiles</th>\n",
              "      <th>data_validity_comment</th>\n",
              "      <th>data_validity_description</th>\n",
              "      <th>document_chembl_id</th>\n",
              "      <th>document_journal</th>\n",
              "      <th>document_year</th>\n",
              "      <th>ligand_efficiency</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>molecule_pref_name</th>\n",
              "      <th>parent_molecule_chembl_id</th>\n",
              "      <th>pchembl_value</th>\n",
              "      <th>potential_duplicate</th>\n",
              "      <th>qudt_units</th>\n",
              "      <th>record_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>src_id</th>\n",
              "      <th>standard_flag</th>\n",
              "      <th>standard_relation</th>\n",
              "      <th>standard_text_value</th>\n",
              "      <th>standard_type</th>\n",
              "      <th>standard_units</th>\n",
              "      <th>standard_upper_value</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>target_pref_name</th>\n",
              "      <th>target_tax_id</th>\n",
              "      <th>text_value</th>\n",
              "      <th>toid</th>\n",
              "      <th>type</th>\n",
              "      <th>units</th>\n",
              "      <th>uo_units</th>\n",
              "      <th>upper_value</th>\n",
              "      <th>value</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>None</td>\n",
              "      <td>1476106</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL828004</td>\n",
              "      <td>Inhibitory concentration against selected kina...</td>\n",
              "      <td>B</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000357</td>\n",
              "      <td>single protein format</td>\n",
              "      <td>CCn1c(-c2nonc2N)nc2cnccc21</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL1139578</td>\n",
              "      <td>Bioorg. Med. Chem. Lett.</td>\n",
              "      <td>2005</td>\n",
              "      <td>{'bei': '30.22', 'le': '0.56', 'lle': '5.88', ...</td>\n",
              "      <td>CHEMBL189657</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL189657</td>\n",
              "      <td>6.96</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>382064</td>\n",
              "      <td>=</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>110.0</td>\n",
              "      <td>CHEMBL2292</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
              "      <td>9606</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>110.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>None</td>\n",
              "      <td>1476649</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL828004</td>\n",
              "      <td>Inhibitory concentration against selected kina...</td>\n",
              "      <td>B</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000357</td>\n",
              "      <td>single protein format</td>\n",
              "      <td>CCn1c(-c2nonc2N)nc2cncc(CNC3CCNCC3)c21</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL1139578</td>\n",
              "      <td>Bioorg. Med. Chem. Lett.</td>\n",
              "      <td>2005</td>\n",
              "      <td>{'bei': '15.51', 'le': '0.29', 'lle': '4.39', ...</td>\n",
              "      <td>CHEMBL188434</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL188434</td>\n",
              "      <td>5.31</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>382011</td>\n",
              "      <td>=</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>4900.0</td>\n",
              "      <td>CHEMBL2292</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
              "      <td>9606</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>4900.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>None</td>\n",
              "      <td>1701840</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL861010</td>\n",
              "      <td>Inhibition of human recombinant DYRK1a</td>\n",
              "      <td>B</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000357</td>\n",
              "      <td>single protein format</td>\n",
              "      <td>O=c1oc2c(O)c(O)cc3c(=O)oc4c(O)c(O)cc1c4c23</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL1146093</td>\n",
              "      <td>J. Med. Chem.</td>\n",
              "      <td>2006</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL6246</td>\n",
              "      <td>ELLAGIC ACID</td>\n",
              "      <td>CHEMBL6246</td>\n",
              "      <td>None</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>448549</td>\n",
              "      <td>&gt;</td>\n",
              "      <td>1</td>\n",
              "      <td>True</td>\n",
              "      <td>&gt;</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>40000.0</td>\n",
              "      <td>CHEMBL2292</td>\n",
              "      <td>Homo sapiens</td>\n",
              "      <td>Dual-specificity tyrosine-phosphorylation regu...</td>\n",
              "      <td>9606</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>40.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>"
            ],
            "text/plain": [
              "  activity_comment  activity_id  ... upper_value   value\n",
              "0             None      1476106  ...        None   110.0\n",
              "1             None      1476649  ...        None  4900.0\n",
              "2             None      1701840  ...        None    40.0\n",
              "\n",
              "[3 rows x 45 columns]"
            ]
          },
          "execution_count": 7,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "df.head(3)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fQ78N26Fg15T"
      },
      "source": [
        "Finally we will save the resulting bioactivity data to a CSV file **bioactivity_data.csv**."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "id": "ZvUUEIVxTOH1"
      },
      "outputs": [],
      "source": [
        "df.to_csv('sars-cov-2_bioactivity_data_raw.csv', index=False)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_GXMpFNUOn_8"
      },
      "source": [
        "## **Handling missing data**\n",
        "If any compounds has missing value for the **standard_value** column then drop it"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "hkVOdk6ZR396",
        "outputId": "3e43fd45-64b4-43b4-cf6c-5c5135c77cc6"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-84115d41-2f6b-4a58-9622-da29e5b0d6fe\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>activity_comment</th>\n",
              "      <th>activity_id</th>\n",
              "      <th>activity_properties</th>\n",
              "      <th>assay_chembl_id</th>\n",
              "      <th>assay_description</th>\n",
              "      <th>assay_type</th>\n",
              "      <th>assay_variant_accession</th>\n",
              "      <th>assay_variant_mutation</th>\n",
              "      <th>bao_endpoint</th>\n",
              "      <th>bao_format</th>\n",
              "      <th>bao_label</th>\n",
              "      <th>canonical_smiles</th>\n",
              "      <th>data_validity_comment</th>\n",
              "      <th>data_validity_description</th>\n",
              "      <th>document_chembl_id</th>\n",
              "      <th>document_journal</th>\n",
              "      <th>document_year</th>\n",
              "      <th>ligand_efficiency</th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>molecule_pref_name</th>\n",
              "      <th>parent_molecule_chembl_id</th>\n",
              "      <th>pchembl_value</th>\n",
              "      <th>potential_duplicate</th>\n",
              "      <th>qudt_units</th>\n",
              "      <th>record_id</th>\n",
              "      <th>relation</th>\n",
              "      <th>src_id</th>\n",
              "      <th>standard_flag</th>\n",
              "      <th>standard_relation</th>\n",
              "      <th>standard_text_value</th>\n",
              "      <th>standard_type</th>\n",
              "      <th>standard_units</th>\n",
              "      <th>standard_upper_value</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>target_chembl_id</th>\n",
              "      <th>target_organism</th>\n",
              "      <th>target_pref_name</th>\n",
              "      <th>target_tax_id</th>\n",
              "      <th>text_value</th>\n",
              "      <th>toid</th>\n",
              "      <th>type</th>\n",
              "      <th>units</th>\n",
              "      <th>uo_units</th>\n",
              "      <th>upper_value</th>\n",
              "      <th>value</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>None</td>\n",
              "      <td>18827175</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCN1CCN(Cc2ccc(Nc3ncc(F)c(-c4cc(F)c5nc(C)n(C(C...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>ABEMACICLIB</td>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>5.18</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133933</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>6620.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>6.62</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>None</td>\n",
              "      <td>18827176</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCN(CC)Cc1cc(Nc2ccnc3cc(Cl)ccc23)ccc1O</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>AMODIAQUINE</td>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>5.29</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133934</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>5150.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>5.15</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>None</td>\n",
              "      <td>18827177</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCCCCOc1ccc(-c2ccc(-c3ccc(C(=O)N[C@H]4C[C@@H](...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>ANIDULAFUNGIN</td>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>5.33</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133935</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>4640.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>4.64</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>None</td>\n",
              "      <td>18827178</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>BAZEDOXIFENE</td>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>5.46</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133936</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>3440.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>3.44</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>None</td>\n",
              "      <td>18827179</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4303812</td>\n",
              "      <td>Antiviral activity against SARS-CoV-2 (viral t...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>COc1cc2c3cc1Oc1c(OC)c(OC)cc4c1[C@@H](Cc1ccc(O)...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4303084</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>BERBAMINE</td>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>5.10</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3133937</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>7870.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>7.87</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>364</th>\n",
              "      <td>None</td>\n",
              "      <td>20154346</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>c1cncc(CN2CCC(n3ncc4c(N5CCOCC5)nc(-c5ccc6[nH]c...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL583194</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL583194</td>\n",
              "      <td>5.03</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3367857</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>9430.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>9.43</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>365</th>\n",
              "      <td>None</td>\n",
              "      <td>20154347</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CCCCn1c(NC(=O)c2cccs2)c(C#N)c2nc3ccccc3nc21</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>4.71</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3362297</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>19650.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>19.65</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>367</th>\n",
              "      <td>None</td>\n",
              "      <td>20154349</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CNC(=O)Nc1ccc(-c2nc(N3CC4CCC(C3)O4)c3cnn(C4CCC...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>4.67</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3363375</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>21620.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>21.62</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>368</th>\n",
              "      <td>None</td>\n",
              "      <td>20154350</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>CC(C)N1CCN(c2ccc(C(=O)c3c(-c4ccc(O)cc4)sc4cc(O...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>4.53</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3368236</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>29870.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>29.87</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>369</th>\n",
              "      <td>None</td>\n",
              "      <td>20154351</td>\n",
              "      <td>[]</td>\n",
              "      <td>CHEMBL4513083</td>\n",
              "      <td>Determination of IC50 values for inhibition of...</td>\n",
              "      <td>F</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>BAO_0000190</td>\n",
              "      <td>BAO_0000218</td>\n",
              "      <td>organism-based format</td>\n",
              "      <td>Nc1ccc(C(=O)Nc2cccc(-c3nc(N4CCOCC4)c4oc5ncccc5...</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL4495565</td>\n",
              "      <td>None</td>\n",
              "      <td>2020</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>None</td>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>5.33</td>\n",
              "      <td>False</td>\n",
              "      <td>http://www.openphacts.org/units/Nanomolar</td>\n",
              "      <td>3366145</td>\n",
              "      <td>=</td>\n",
              "      <td>52</td>\n",
              "      <td>True</td>\n",
              "      <td>=</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>nM</td>\n",
              "      <td>None</td>\n",
              "      <td>4730.0</td>\n",
              "      <td>CHEMBL4303835</td>\n",
              "      <td>Severe acute respiratory syndrome coronavirus 2</td>\n",
              "      <td>SARS-CoV-2</td>\n",
              "      <td>2697049</td>\n",
              "      <td>None</td>\n",
              "      <td>None</td>\n",
              "      <td>IC50</td>\n",
              "      <td>uM</td>\n",
              "      <td>UO_0000065</td>\n",
              "      <td>None</td>\n",
              "      <td>4.73</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>286 rows × 45 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-84115d41-2f6b-4a58-9622-da29e5b0d6fe')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-84115d41-2f6b-4a58-9622-da29e5b0d6fe button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-84115d41-2f6b-4a58-9622-da29e5b0d6fe');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    activity_comment  activity_id  ... upper_value  value\n",
              "0               None     18827175  ...        None   6.62\n",
              "1               None     18827176  ...        None   5.15\n",
              "2               None     18827177  ...        None   4.64\n",
              "3               None     18827178  ...        None   3.44\n",
              "4               None     18827179  ...        None   7.87\n",
              "..               ...          ...  ...         ...    ...\n",
              "364             None     20154346  ...        None   9.43\n",
              "365             None     20154347  ...        None  19.65\n",
              "367             None     20154349  ...        None  21.62\n",
              "368             None     20154350  ...        None  29.87\n",
              "369             None     20154351  ...        None   4.73\n",
              "\n",
              "[286 rows x 45 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ],
      "source": [
        "df2 = df[df.pchembl_value.notna()]\n",
        "df2"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5H4sSFAWhV9B"
      },
      "source": [
        "## **Data pre-processing of the bioactivity data**"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tO22XVlzhkXR"
      },
      "source": [
        "### **Labeling compounds as either being active, inactive or intermediate**\n",
        "The bioactivity data is in the IC50 unit. Compounds having values of less than 1000 nM will be considered to be **active** while those greater than 10,000 nM will be considered to be **inactive**. As for those values in between 1,000 and 10,000 nM will be referred to as **intermediate**. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "1E8rz7oMOd-5"
      },
      "outputs": [],
      "source": [
        "bioactivity_class = []\n",
        "for i in df2.standard_value:\n",
        "  if float(i) >= 10000:\n",
        "    bioactivity_class.append(\"inactive\")\n",
        "  elif float(i) <= 1000:\n",
        "    bioactivity_class.append(\"active\")\n",
        "  else:\n",
        "    bioactivity_class.append(\"intermediate\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Nv2dzid_hzKd"
      },
      "source": [
        "### **Combine the 3 columns (molecule_chembl_id,canonical_smiles,standard_value) and bioactivity_class into a DataFrame**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 424
        },
        "id": "0NCLYmrASgha",
        "outputId": "4d998e14-427b-426f-eb92-e3cd881f0f9f"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-176dfcdd-5885-4d43-90f8-0fc50ed3b3a9\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>canonical_smiles</th>\n",
              "      <th>standard_value</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>CCN1CCN(Cc2ccc(Nc3ncc(F)c(-c4cc(F)c5nc(C)n(C(C...</td>\n",
              "      <td>6620.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>CCN(CC)Cc1cc(Nc2ccnc3cc(Cl)ccc23)ccc1O</td>\n",
              "      <td>5150.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>CCCCCOc1ccc(-c2ccc(-c3ccc(C(=O)N[C@H]4C[C@@H](...</td>\n",
              "      <td>4640.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2...</td>\n",
              "      <td>3440.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>COc1cc2c3cc1Oc1c(OC)c(OC)cc4c1[C@@H](Cc1ccc(O)...</td>\n",
              "      <td>7870.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>364</th>\n",
              "      <td>CHEMBL583194</td>\n",
              "      <td>c1cncc(CN2CCC(n3ncc4c(N5CCOCC5)nc(-c5ccc6[nH]c...</td>\n",
              "      <td>9430.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>365</th>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>CCCCn1c(NC(=O)c2cccs2)c(C#N)c2nc3ccccc3nc21</td>\n",
              "      <td>19650.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>367</th>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>CNC(=O)Nc1ccc(-c2nc(N3CC4CCC(C3)O4)c3cnn(C4CCC...</td>\n",
              "      <td>21620.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>368</th>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>CC(C)N1CCN(c2ccc(C(=O)c3c(-c4ccc(O)cc4)sc4cc(O...</td>\n",
              "      <td>29870.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>369</th>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>Nc1ccc(C(=O)Nc2cccc(-c3nc(N4CCOCC4)c4oc5ncccc5...</td>\n",
              "      <td>4730.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>286 rows × 3 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-176dfcdd-5885-4d43-90f8-0fc50ed3b3a9')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-176dfcdd-5885-4d43-90f8-0fc50ed3b3a9 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-176dfcdd-5885-4d43-90f8-0fc50ed3b3a9');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    molecule_chembl_id  ... standard_value\n",
              "0        CHEMBL3301610  ...         6620.0\n",
              "1            CHEMBL682  ...         5150.0\n",
              "2         CHEMBL264241  ...         4640.0\n",
              "3          CHEMBL46740  ...         3440.0\n",
              "4         CHEMBL504323  ...         7870.0\n",
              "..                 ...  ...            ...\n",
              "364       CHEMBL583194  ...         9430.0\n",
              "365      CHEMBL3967196  ...        19650.0\n",
              "367       CHEMBL601661  ...        21620.0\n",
              "368       CHEMBL178334  ...        29870.0\n",
              "369      CHEMBL2178735  ...         4730.0\n",
              "\n",
              "[286 rows x 3 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 17
        }
      ],
      "source": [
        "selection = ['molecule_chembl_id','canonical_smiles','standard_value']\n",
        "df3 = df2[selection]\n",
        "df3"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yz0ae4mNnnr6"
      },
      "source": [
        "# In case there is a mismatch in numbering, we use a temperary file to get rid of this."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "9W4jf7srzfKa"
      },
      "outputs": [],
      "source": [
        "df3.to_csv('temp.csv',index=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "id": "9sSfFoFIzbkP"
      },
      "outputs": [],
      "source": [
        "df_temp = pd.read_csv('temp.csv')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 424
        },
        "id": "Soyv4f22zwMm",
        "outputId": "8de25b4e-c9a9-4285-e877-dd846b1b7e13"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-0016826d-4fc0-4b95-abd3-b2c1db987b77\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>canonical_smiles</th>\n",
              "      <th>standard_value</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>CCN1CCN(Cc2ccc(Nc3ncc(F)c(-c4cc(F)c5nc(C)n(C(C...</td>\n",
              "      <td>6620.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>CCN(CC)Cc1cc(Nc2ccnc3cc(Cl)ccc23)ccc1O</td>\n",
              "      <td>5150.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>CCCCCOc1ccc(-c2ccc(-c3ccc(C(=O)N[C@H]4C[C@@H](...</td>\n",
              "      <td>4640.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2...</td>\n",
              "      <td>3440.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>COc1cc2c3cc1Oc1c(OC)c(OC)cc4c1[C@@H](Cc1ccc(O)...</td>\n",
              "      <td>7870.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>281</th>\n",
              "      <td>CHEMBL583194</td>\n",
              "      <td>c1cncc(CN2CCC(n3ncc4c(N5CCOCC5)nc(-c5ccc6[nH]c...</td>\n",
              "      <td>9430.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>282</th>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>CCCCn1c(NC(=O)c2cccs2)c(C#N)c2nc3ccccc3nc21</td>\n",
              "      <td>19650.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>283</th>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>CNC(=O)Nc1ccc(-c2nc(N3CC4CCC(C3)O4)c3cnn(C4CCC...</td>\n",
              "      <td>21620.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>284</th>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>CC(C)N1CCN(c2ccc(C(=O)c3c(-c4ccc(O)cc4)sc4cc(O...</td>\n",
              "      <td>29870.0</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>285</th>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>Nc1ccc(C(=O)Nc2cccc(-c3nc(N4CCOCC4)c4oc5ncccc5...</td>\n",
              "      <td>4730.0</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>286 rows × 3 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-0016826d-4fc0-4b95-abd3-b2c1db987b77')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-0016826d-4fc0-4b95-abd3-b2c1db987b77 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-0016826d-4fc0-4b95-abd3-b2c1db987b77');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    molecule_chembl_id  ... standard_value\n",
              "0        CHEMBL3301610  ...         6620.0\n",
              "1            CHEMBL682  ...         5150.0\n",
              "2         CHEMBL264241  ...         4640.0\n",
              "3          CHEMBL46740  ...         3440.0\n",
              "4         CHEMBL504323  ...         7870.0\n",
              "..                 ...  ...            ...\n",
              "281       CHEMBL583194  ...         9430.0\n",
              "282      CHEMBL3967196  ...        19650.0\n",
              "283       CHEMBL601661  ...        21620.0\n",
              "284       CHEMBL178334  ...        29870.0\n",
              "285      CHEMBL2178735  ...         4730.0\n",
              "\n",
              "[286 rows x 3 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 20
        }
      ],
      "source": [
        "df_temp"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "id": "HWMtx5J8z2Wq"
      },
      "outputs": [],
      "source": [
        "df3 = df_temp"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 424
        },
        "id": "Li64nUiZQ-y2",
        "outputId": "25ccc893-7eb2-4f53-ba26-4b0a099f65d8"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/html": [
              "\n",
              "  <div id=\"df-b85a4339-b86e-4093-bbee-6684c2b09b67\">\n",
              "    <div class=\"colab-df-container\">\n",
              "      <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>molecule_chembl_id</th>\n",
              "      <th>canonical_smiles</th>\n",
              "      <th>standard_value</th>\n",
              "      <th>bioactivity_class</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>CHEMBL3301610</td>\n",
              "      <td>CCN1CCN(Cc2ccc(Nc3ncc(F)c(-c4cc(F)c5nc(C)n(C(C...</td>\n",
              "      <td>6620.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>CHEMBL682</td>\n",
              "      <td>CCN(CC)Cc1cc(Nc2ccnc3cc(Cl)ccc23)ccc1O</td>\n",
              "      <td>5150.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>CHEMBL264241</td>\n",
              "      <td>CCCCCOc1ccc(-c2ccc(-c3ccc(C(=O)N[C@H]4C[C@@H](...</td>\n",
              "      <td>4640.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>CHEMBL46740</td>\n",
              "      <td>Cc1c(-c2ccc(O)cc2)n(Cc2ccc(OCCN3CCCCCC3)cc2)c2...</td>\n",
              "      <td>3440.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>CHEMBL504323</td>\n",
              "      <td>COc1cc2c3cc1Oc1c(OC)c(OC)cc4c1[C@@H](Cc1ccc(O)...</td>\n",
              "      <td>7870.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>...</th>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "      <td>...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>281</th>\n",
              "      <td>CHEMBL583194</td>\n",
              "      <td>c1cncc(CN2CCC(n3ncc4c(N5CCOCC5)nc(-c5ccc6[nH]c...</td>\n",
              "      <td>9430.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>282</th>\n",
              "      <td>CHEMBL3967196</td>\n",
              "      <td>CCCCn1c(NC(=O)c2cccs2)c(C#N)c2nc3ccccc3nc21</td>\n",
              "      <td>19650.0</td>\n",
              "      <td>inactive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>283</th>\n",
              "      <td>CHEMBL601661</td>\n",
              "      <td>CNC(=O)Nc1ccc(-c2nc(N3CC4CCC(C3)O4)c3cnn(C4CCC...</td>\n",
              "      <td>21620.0</td>\n",
              "      <td>inactive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>284</th>\n",
              "      <td>CHEMBL178334</td>\n",
              "      <td>CC(C)N1CCN(c2ccc(C(=O)c3c(-c4ccc(O)cc4)sc4cc(O...</td>\n",
              "      <td>29870.0</td>\n",
              "      <td>inactive</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>285</th>\n",
              "      <td>CHEMBL2178735</td>\n",
              "      <td>Nc1ccc(C(=O)Nc2cccc(-c3nc(N4CCOCC4)c4oc5ncccc5...</td>\n",
              "      <td>4730.0</td>\n",
              "      <td>intermediate</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>286 rows × 4 columns</p>\n",
              "</div>\n",
              "      <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b85a4339-b86e-4093-bbee-6684c2b09b67')\"\n",
              "              title=\"Convert this dataframe to an interactive table.\"\n",
              "              style=\"display:none;\">\n",
              "        \n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
              "    <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
              "  </svg>\n",
              "      </button>\n",
              "      \n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      flex-wrap:wrap;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "      <script>\n",
              "        const buttonEl =\n",
              "          document.querySelector('#df-b85a4339-b86e-4093-bbee-6684c2b09b67 button.colab-df-convert');\n",
              "        buttonEl.style.display =\n",
              "          google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "        async function convertToInteractive(key) {\n",
              "          const element = document.querySelector('#df-b85a4339-b86e-4093-bbee-6684c2b09b67');\n",
              "          const dataTable =\n",
              "            await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                     [key], {});\n",
              "          if (!dataTable) return;\n",
              "\n",
              "          const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "            '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "            + ' to learn more about interactive tables.';\n",
              "          element.innerHTML = '';\n",
              "          dataTable['output_type'] = 'display_data';\n",
              "          await google.colab.output.renderOutput(dataTable, element);\n",
              "          const docLink = document.createElement('div');\n",
              "          docLink.innerHTML = docLinkHtml;\n",
              "          element.appendChild(docLink);\n",
              "        }\n",
              "      </script>\n",
              "    </div>\n",
              "  </div>\n",
              "  "
            ],
            "text/plain": [
              "    molecule_chembl_id  ... bioactivity_class\n",
              "0        CHEMBL3301610  ...      intermediate\n",
              "1            CHEMBL682  ...      intermediate\n",
              "2         CHEMBL264241  ...      intermediate\n",
              "3          CHEMBL46740  ...      intermediate\n",
              "4         CHEMBL504323  ...      intermediate\n",
              "..                 ...  ...               ...\n",
              "281       CHEMBL583194  ...      intermediate\n",
              "282      CHEMBL3967196  ...          inactive\n",
              "283       CHEMBL601661  ...          inactive\n",
              "284       CHEMBL178334  ...          inactive\n",
              "285      CHEMBL2178735  ...      intermediate\n",
              "\n",
              "[286 rows x 4 columns]"
            ]
          },
          "metadata": {},
          "execution_count": 22
        }
      ],
      "source": [
        "bioactivity_class = pd.Series(bioactivity_class, name='bioactivity_class')\n",
        "df4 = pd.concat([df3, bioactivity_class], axis=1)\n",
        "df4"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9tlgyexWh7YJ"
      },
      "source": [
        "Saves dataframe to CSV file"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "id": "nSNia7suXstR"
      },
      "outputs": [],
      "source": [
        "df4.to_csv('sars-cov-2_bioactivity_data_preprocessed.csv', index=False)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "UuZf5-MEd-H5",
        "outputId": "fab51178-e04f-46a0-d348-7ba6a7fbbbc0"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "total 264\n",
            "drwxr-xr-x 1 root root   4096 Dec 23 14:32 sample_data\n",
            "-rw-r--r-- 1 root root  26606 Jan  9 11:01 sars-cov-2_bioactivity_data_preprocessed.csv\n",
            "-rw-r--r-- 1 root root 209411 Jan  9 10:58 sars-cov-2_bioactivity_data_raw.csv\n",
            "-rw-r--r-- 1 root root  23530 Jan  9 11:01 temp.csv\n"
          ]
        }
      ],
      "source": [
        "! ls -l"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZywB5K_Dlawb"
      },
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "1_public.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.8"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}